Goto

Collaborating Authors

 Xi'an


FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning

arXiv.org Artificial Intelligence

--Audio-Visual Question Answering (A VQA) is a challenging multimodal reasoning task requiring intelligent systems to answer natural language queries based on paired audio-video inputs accurately. However, existing A VQA approaches often suffer from overfitting to dataset biases, leading to poor robustness. T o address these challenges, we first introduce a novel dataset, FortisA VQA, constructed in two stages: (1) rephrasing questions in the test split of the public MUSIC-A VQA dataset and (2) introducing distribution shifts across questions. The first stage expands the test space with greater diversity, while the second enables a refined robustness evaluation across rare, frequent, and overall question distributions. Second, we introduce a robust Multimodal Audio-Visual Epistemic Network (MA VEN) that leverages a multifaceted cycle collaborative debiasing strategy to mitigate bias learning. Experimental results demonstrate that our architecture achieves state-of-the-art performance on FortisA VQA, with a notable improvement of 7.81%. Additionally, our evaluation reveals the limited robustness of existing multimodal QA methods. We also verify the plug-and-play capability of our strategy by integrating it with various baseline models across both datasets. UMANS possess the extraordinary capacity to seam-lessly integrate auditory and visual cues, effectively establishing a cohesive relationship between visual and auditory stimuli [1-3]. Jie Ma, Pinghui Wang, Jing Tao and Zhou Su are with the Ministry of Education of Key Laboratory for Intelligent Networks and Network Security, School of Cyber Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China. Zhitao Gao and Jun Liu are with the Shannxi Provincial Key Laboratory of Big Data Knowledge Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China. Qi Chai is with the Information Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, 510000, China. The question in current A VQA datasets is generated by a limited set of predefined templates, which may not be in line with the real-world scenario. Our findings indicate that existing methods such as STG [6] are not robust, which may be attributed to excessive bias learning, such as memorizing statistical regularities between critical question words and answers. It requires the system to learn high-order interaction representations of the concepts encompassed with audio, video, and language modalities. As is known to us [8-10], the high-level reasoning ability of the system mainly relies on large-scale data that does not contain harmful biases or statistical regularities. However, completely avoiding the negative bias in datasets seems challenging [11] due to the inherent skewness in real-world data distributions.


A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation

arXiv.org Artificial Intelligence

With the widespread adoption of electric vehicles (EVs), navigating for EV drivers to select a cost-effective charging station has become an important yet challenging issue due to dynamic traffic conditions, fluctuating electricity prices, and potential competition from other EVs. The state-of-the-art deep reinforcement learning (DRL) algorithms for solving this task still require global information about all EVs at the execution stage, which not only increases communication costs but also raises privacy issues among EV drivers. To overcome these drawbacks, we introduce a novel generative model-enhanced multi-agent DRL algorithm that utilizes only the EV's local information while achieving performance comparable to these state-of-the-art algorithms. Specifically, the policy network is implemented on the EV side, and a Conditional Variational Autoencoder-Long Short Term Memory (CVAE-LSTM)-based recommendation model is developed to provide recommendation information. Furthermore, a novel future charging competition encoder is designed to effectively compress global information, enhancing training performance. The multi-gradient descent algorithm (MGDA) is also utilized to adaptively balance the weight between the two parts of the training objective, resulting in a more stable training process. Simulations are conducted based on a practical area in Xi\'an, China. Experimental results show that our proposed algorithm, which relies on local information, outperforms existing local information-based methods and achieves less than 8\% performance loss compared to global information-based methods.


FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis

arXiv.org Artificial Intelligence

--Federated domain generalization aims to train a global model from multiple source domains and ensure its generalization ability to unseen target domains. Due to the target domain being with unknown domain shifts, attempting to approximate these gaps by source domains may be the key to improving model generalization capability. Existing works mainly focus on sharing and recombining local domain-specific attributes to increase data diversity and simulate potential domain shifts. However, these methods may be insufficient since only the local attribute recombination can be hard to touch the out-of-distribution of global data. In this paper, we propose a simple-yet-efficient framework named Federated Domain Adversarial Generation (FedDAG). It aims to simulate the domain shift and improve the model generalization by adversarially generating novel domains different from local and global source domains. Specifically, it generates novel-style images by maximizing the instance-level feature discrepancy between original and generated images and trains a generalizable task model by minimizing their feature discrepancy. Further, we observed that FedDAG could cause different performance improvements for local models. It may be due to inherent data isolation and heterogeneity among clients, exacerbating the imbalance in their generalization contributions to the global model. Ignoring this imbalance can lead the global model's generalization ability to be sub-optimal, further limiting the novel domain generation procedure. Thus, to mitigate this imbalance, FedDAG hierarchically aggregates local models at the within-client and across-client levels by using the sharpness concept to evaluate client model generalization contributions. Extensive experiments across four medical benchmarks demonstrate FedDAG's ability to enhance generalization in federated medical scenarios. ITH the continuous advancement of medical research and clinical practice, the medical field generates substantial data [1], [2]. This work was supported by the National Natural Science Foundation of China (No. 62202403), Hong Kong Innovation and Technology Fund (Project No. MHP/002/22) and Shenzhen Science and Technology Innovation Committee Fund (Project No. SGDX20210823103201011). H. Che and H. Jin are with the Department of Computer Science and Engineering, at the Hong Kong University of Science and Technology University, Hong Kong SAR, China. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.


Erasing Noise in Signal Detection with Diffusion Model: From Theory to Application

arXiv.org Artificial Intelligence

In this paper, a signal detection method based on the denoise diffusion model (DM) is proposed, which outperforms the maximum likelihood (ML) estimation method that has long been regarded as the optimal signal detection technique. Theoretically, a novel mathematical theory for intelligent signal detection based on stochastic differential equations (SDEs) is established in this paper, demonstrating the effectiveness of DM in reducing the additive white Gaussian noise in received signals. Moreover, a mathematical relationship between the signal-to-noise ratio (SNR) and the timestep in DM is established, revealing that for any given SNR, a corresponding optimal timestep can be identified. Furthermore, to address potential issues with out-of-distribution inputs in the DM, we employ a mathematical scaling technique that allows the trained DM to handle signal detection across a wide range of SNRs without any fine-tuning. Xiucheng Wang, Peilin Zheng, Nan Cheng are with the State Key Laboratory of ISN and School of Telecommunications Engineering, Xidian University, Xi'an 710071, China. Signal detection plays a critical role in digital baseband transmission, since it estimates which symbols are transmitted by the sender, from the noisy received signals. Thus, the performance of signal detection directly impacts the symbol error rate (SER) of data transmission, which in turn determines the error-free transmission rate, also known as the Shannon threshold [1]. As a result, numerous signal detection techniques have been developed to minimize the SER and bring the transmission rate as close as possible to the Shannon threshold.


Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance

arXiv.org Artificial Intelligence

Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce {\lambda}-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.


A Synthetic Analysis for Xi'an

Neural Information Processing Systems

XGBoost: The method takes travel information (e.g., distance, departure time, etc.) for each trip as input, and then estimates the travel time using an ensemble learning approach. WDR: This is a popular travel time estimation method, which estimate travel time through a combination of wide network, depth network, and recurrent network. DepTTE: This is one of the representative travel time estimation methods. It first converts the original GPS trajectory into a series of high-dimensional features, and then applies RNN to capture the spatial-temporal dependence.


Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition

arXiv.org Artificial Intelligence

Abstract--Previous knowledge distillation (KD) methods mostly focus on compressing network architectures, which is not thorough enough in deployment as some costs like transmission bandwidth and imaging equipment are related to the image size. Therefore, we propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints. Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources. Specifically, we first propose an input spatial representation distillation (ISRD) mechanism to transfer spatial knowledge from large images to student's input module, which can facilitate stable knowledge transfer between CNN and ViT. Then, a Teacher-Assistant-Student (TAS) framework is further established to disentangle pixel distillation into the model compression stage and input compression stage, which significantly reduces the overall complexity of pixel distillation and the difficulty of distilling intermediate knowledge. Finally, we adapt pixel distillation to object detection via an aligned feature for preservation (AFP) strategy for TAS, which aligns output dimensions of detectors at each stage by manipulating features and anchors of the assistant. Comprehensive experiments on image classification and object detection demonstrate the effectiveness of our method. To deal with this situation, KD techniques that aim at using smaller network architectures received great attention Figure 1: (a) Compared to network architecture, input size has in the past few years--usually with fewer network an impact on more kinds of costs, including requirements for cameras and transmission bandwidth. Guangyu Guo is with Brain and Artificial Intelligence Laboratory, School of Automation, Northwestern Polytechnical University, Xi'an, China.


Behold, a giant AI-generated rat dick

Mashable

Earlier this week, scientific journal Frontiers in Cell and Developmental Biology published a paper entitled "Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway." In it, three researchers from Xi'an Honghui Hospital and Xi'an Jiaotong University aimed to summarise current research on sperm stem cells. They also showed off an absolutely enormous, wildly anatomically incorrect, AI-generated rat dick. The article featured three apparently illustrative images, all of which were created by AI art generator Midjourney, and all of which were blatantly incorrect. The most obvious errors concerned the Rodent of Unusual Size depicted in the article's first figure.


Multitask Weakly Supervised Learning for Origin Destination Travel Time Estimation

arXiv.org Artificial Intelligence

Travel time estimation from GPS trips is of great importance to order duration, ridesharing, taxi dispatching, etc. However, the dense trajectory is not always available due to the limitation of data privacy and acquisition, while the origin destination (OD) type of data, such as NYC taxi data, NYC bike data, and Capital Bikeshare data, is more accessible. To address this issue, this paper starts to estimate the OD trips travel time combined with the road network. Subsequently, a Multitask Weakly Supervised Learning Framework for Travel Time Estimation (MWSL TTE) has been proposed to infer transition probability between roads segments, and the travel time on road segments and intersection simultaneously. Technically, given an OD pair, the transition probability intends to recover the most possible route. And then, the output of travel time is equal to the summation of all segments' and intersections' travel time in this route. A novel route recovery function has been proposed to iteratively maximize the current route's co occurrence probability, and minimize the discrepancy between routes' probability distribution and the inverse distribution of routes' estimation loss. Moreover, the expected log likelihood function based on a weakly supervised framework has been deployed in optimizing the travel time from road segments and intersections concurrently. We conduct experiments on a wide range of real world taxi datasets in Xi'an and Chengdu and demonstrate our method's effectiveness on route recovery and travel time estimation.


Faculty Position in Artificial Intelligence and Advanced Computing job with XIAN JIAOTONG LIVERPOOL UNIVERSITY (XJTLU)

#artificialintelligence

In 2006 Xi'an Jiaotong-Liverpool University (XJTLU) was created by the University of Liverpool and Xi'an Jiaotong University – a top ten university in China. Offering a unique international education experience, XJTLU brings together excellent research practice and expertise from both institutions and gives students the skills and knowledge they need to secure careers in a global marketplace. XJTLU now has over 25,000 enrolled students in both Suzhou and Liverpool in the UK, with plans to grow to about 28,000 students by 2025. There are currently about 2,000 staff, among which about 1,000 academic staff, with an almost even split between citizens of the People's Republic of China and international passport holders. XJTLU offers our undergraduates and postgraduates over 100 programmes with a diverse spectrum of courses.